[1] "species" "island" "bill_length_mm"
[4] "bill_depth_mm" "flipper_length_mm" "body_mass_g"
[7] "sex" "year"
Principal Data Scientist @ Jumping Rivers:
Project management.
Python & machine learning support for clients.
Teach courses in programming, SQL, ML.
Organise North East & Leeds data science meetups.
↗ jumpingrivers.com 𝕏 @jumping_uk
How I got into Data Science
First encounter with MLOps
Getting to grips using Vetiver (code examples)
Take home lessons
PhD in Astrophysics (started 2017)
Extra training in “Data Intensive Science”
… academia is hard
Joined Jumping Rivers full time in 2022 (following an internship)
My initial experience:
Software development (check out diffify.com)
Course writing and teaching
LOTS of merge requests
Conferences and meetups
Data science is not always about machine learning!
The dreaded architecture diagram…
Countless permutations
Very multidisciplinary
Expensive
Palmer Penguin dataset
Using {tidyr} and {rsample}:
species:Convert our {tidymodels} model to a {vetiver} model:
Contains all the info needed to version, store and deploy our model!
Retrieve a model
Inspect the stored versions
We deploy models as APIs which take input data and send back model predictions.
APIs can be hosted at public endpoints on the web.
We can run them on the localhost (during testing / development).
{vetiver} uses {plumber} to create a model API.
As our data grows, we should check model performance.
Monitor key model metrics over time using vetiver::vetiver_compute_metrics()
Store model metrics: vetiver::vetiver_pin_metrics()
Plot the metrics: vetiver::vetiver_plot_metrics()
Over time we may notice a drop in performance…
Life as a Data Scientist isn’t always about machine learning!
Architecture diagrams can be incredibly useful.
… but do consider your target audience!
You can get started on MLOps right now with free and open source tools.
Consider whether it is worth the cost/effort before investing in cloud infrastructure.